The Identification of Spelling Variants in English and German Historical Texts: Manual or Automatic?
نویسندگان
چکیده
The identification of spelling variants in English and German historical texts: manual or automatic? Dawn ARCHER (University of Central Lancashire) Andrea ERNST-GERLACH, Sebastian KEMPKEN, Thomas PILZ (Universität Duisburg-Essen) Paul RAYSON (Lancaster University) The identification of spelling variants in English and German historical texts: manual or automatic?
منابع مشابه
Detecting spelling variants in non-standard texts
Spelling variation in non-standard language, e.g. computer-mediated communication and historical texts, is usually treated as a deviation from a standard spelling, e.g. 2mr as a non-standard spelling for tomorrow. Consequently, in normalization – the standard approach of dealing with spelling variation – so-called non-standard words are mapped to their corresponding standard words. However, the...
متن کاملInformation Access to Historical Documents from the Early New High German Period
With the new interest in historical documents insight grew that electronic access to these texts causes many specific problems. In the first part of the paper we survey the present role of digital historical documents. After collecting central facts and observations on historical language change we comment on the difficulties that result for retrieval and data mining on historical texts. In the...
متن کاملUsing Comparable Collections of Historical Texts for Building a Diachronic Dictionary for Spelling Normalization
In this paper, we argue that comparable collections of historical written resources can help overcoming typical challenges posed by heritage texts enhancing spelling normalization, POS-tagging and subsequent diachronic linguistic analyses. Thus, we present a comparable corpus of historical German recipes and show how such a comparable text collection together with the application of innovative ...
متن کاملSpeech technology-based framework for quantitative analysis of German spelling errors in freely composed children's texts
Reading and writing are core competencies of any society. In Germany, international and national comparative studies such as PISA (Programme for International Student Assessment) or PIRLS (Progress in International Reading Literacy Study IGLU in German) have shown that around 25% of German school children do not reach the minimal competence level necessary to function effectively in society by ...
متن کاملPOS Tagging for Historical Texts with Sparse Training Data
This paper presents a method for part-ofspeech tagging of historical data and evaluates it on texts from different corpora of historical German (15th–18th century). Spelling normalization is used to preprocess the texts before applying a POS tagger trained on modern German corpora. Using only 250 manually normalized tokens as training data, the tagging accuracy of a manuscript from the 15th cen...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- LLC
دوره 23 شماره
صفحات -
تاریخ انتشار 2008